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Abstract — We address the problem of estimating a random 
vector X from two sets of measurements Y and Z, sucli that 
the estimator is linear in Y. We show that the partially linear 
minimum mean squared error (PLMMSE) estimator does not 
require knowing the joint distribution of X and Y in full, 
but rather only its second-order moments. This renders it of 
potential interest in various applications. We further show that 
the PLMMSE method is minimax-optimal among all estimators 
that solely depend on the second-order statistics of X and Y. 
We demonstrate our approach in the context of recovering 
a signal, which is sparse in a unitary dictionary, from noisy 
observations of it and of a filtered version of it. We show that 
in this setting PLMMSE estimation has a clear computational 
advantage, while its performance is comparable to state-of-the-art 
algorithms. We apply our approach both in static and dynamic 
estimation applications. In the former category, we treat the 
problem of image enhancement from blurred/noisy image pairs, 
where we show that PLMMSE estimation performs only slightly 
worse than state-of-the art algorithms, while running an order of 
magnitude faster. In the dynamic setting, we provide a recursive 
implementation of the estimator and demonstrate its utility in 
the context of tracking maneuvering targets from position and 
acceleration measurements. 

Index Terms — Bayesian estimation, minimum mean squared 
error, linear estimation. 



I. Introduction 

Bayesian estimation is concerned with the prediction of 
a random quantity X based on a set of observations Y, 
which are statistically related to X. It is well known that 
the estimator minimizing the mean squared error (MSE) is 
given by the conditional expectation X — E[X\Y]. There 
are various scenarios, however, in which the minimal MSE 
(MMSE) estimator cannot be used. This can either be due 
to implementation constraints, because of the fact that no 
closed form expression for E[X|y] exists, or due to lack of 
complete knowledge of the joint distribution of X and Y. 
In these cases, one often resorts to linear estimation. The 
appeal of the linear MMSE (LMMSE) estimator is rooted in 
the fact that it possesses an easily implementable closed form 
expression, which merely requires knowledge of the joint first- 
and second-order moments of X and Y. 
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For example, the amount of computation required for cal- 
culating the MMSE estimate of a jump-Markov Gaussian 
random process from its noisy version grows exponentially 
in time fl\. By contrast, the LMMSE estimator in this setting 
possesses a simple recursive implementation, similar to the 
Kalman filter ||2|- A similar problem arises in the area of sparse 
representations, in which the use of sparsity-inducing Gaussian 
mixture priors and of Laplacian priors is very common. The 
complexity of calculating the MMSE estimator under the 
former prior is exponential in the vector's dimension, calling 
for approximate solutions [3 1. The MMSE estimator under the 
latter prior does not possess a closed form expression (2], 
which has motivated the use of alternative estimation strategies 
such as the maximum a-posteriori (MAP) method. 

In practical situations, the reasons for not using the MMSE 
estimator may only apply to a subset of the measurements. 
In these cases, it may be desirable to construct an estimator 
that is linear in part of the measurements and nonlinear 
in the rest. Partially linear estimation was studied in the 
statistical literature in the context of regression [5 |. In this 
line of research, it is assumed that the conditional expectation 
g{y, z) = E[X|y = y,Z = z] is linear in y. The goal, then, 
is to approximate g{x,y) from a set of examples {xi,yi,Zi} 
drawn independently from the joint distribution of X, Y and 
Z. In this paper, our goal is to derive the separable partially 
linear MMSE (PLMMSE) estimator Namely, we do not make 
any assumptions on the structure of the MMSE estimate 
E[X|y, Z], but rather look for the estimator that minimizes 
the MSE among all functions g{x, y) of the form Ay + b{z). 

We show that in certain sparse approximation scenarios, the 
PLMMSE solution may be computed much more efficiently 
than the MMSE estimator We demonstrate the usefulness of 
the sparse PLMMSE both in static and in dynamic estimation 
settings. In the static case, we apply our method to the problem 
of image deblurring from blurred/noisy image pairs [6 |. Here, 
we show that PLMMSE estimation performs only slightly 
worse than state-of-the art methods, but is much faster In 
the dynamic regime, we provide a recursive implementation 
of the PLMMSE solution and demonstrate its usefulness in 
tracking a maneuvering target from position and acceleration 
measurements. We show the advantage of PLMMSE filtering 
over state-of-the-art algorithms when the measurements are 
prone to faults or contain outliers. 

The paper is organized as follows. In Section Ull we present 
the PLMMSE estimator and discuss some of its properties. In 
Section HUl we show that the PLMMSE method is optimal in 
a minimax sense among all estimators that solely rely on the 
second-order statistics of X and Y. In Section |IV] we derive 
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the PLMMSE estimator for recovering a signal, sparse in a 
unitary dictionary, from a pair of observations, one blurred and 
one noisy. In Section |V] we apply our method to the problem 
of image enhancement from blurred/noisy measurement pairs . 
In Section |VI] we apply PLMMSE estimation to tracking 
maneuvering targets. 

II. Partially Linear Estimation 

We denote random variables (RVs) by capital letters. The 
pseudo-inverse of a matrix A is denoted by A\ The mean 
E[X] of an RV X is denoted i^x and the auto-co variance 
matrix Cov{X) E[{X - nx){X - ^J^x)^] of X is denoted 
Txx- Similarly, Txy stands for the cross-covariance matrix 
Cov(X,y) = ¥\{X - ^ix){Y - hyY] of two RVs X and 
Y . The joint cumulative distribution function of X and Y 
is written as Fxvix.y) = P(X < x^Y < y), where 
the inequalities are element-wise. By definition, the marginal 
distribution of X is Fx{x) = Fxy{x, oo). In our setting, X 
is the quantity to be estimated and Y and Z are two sets of 
measurements thereof. The RVs X, Y and Z take values in 
M^^, and R^, respectively. The MSE of an estimator X 
of X is defined as E[||X - X\W 

We begin by considering the most general form of a partially 
linear estimator of X based on Y and Z, which is given by 

X = A{Z)Y + b{Z). (1) 

Here A{z) is a matrix-valued function and h{z) is a vector- 
valued function, so that the realization z of Z is used to choose 
one of a family of linear estimators of x based on y. 

Theorem 1 Consider estimators of X having the form ([T]), 
for some (Bo re I measurable) functions A : M'^ — >■ M*^^^ 
and b : MP — > R*^. Then the estimator minimizing the MSE 
within this class is given by 

X = TxYizTl^y^ziY - E[Y\Z]) + E[X\Z], (2) 

where Txy\z = MX - E{X\Z\){Y - E[F|Z])^|Z] denotes 
the cross-covariance of X and Y given Z and Tyy\z — 
E[{Y - E[y|Z])(y - E[F|Z])^|Z] is the auto-covariance of 
Y given Z. 

Proof: See Appendix lAl ■ 
Note that ^ is indeed of the form of ([TJ with A{Z) ~ 
TxY\zT\.y^z and b{Z) = E[X\Z] - rxy|z4^|2E[y 
As can be seen, although the MMSE solution among the class 
of estimators ([T]i has a simple form, it requires knowing the 
conditional covariance Txy\z, which limits its applicability. 
In particular, this solution cannot be applied in cases where 
we merely know the unconditional covariance Txy- 

To relax this restriction, we next consider separable partially 
linear estimation. Namely, we seek to minimize the MSE 
among all functions of the form 

X = AY + b{Z), (3) 

where A is a deterministic matrix and b{z) is a vector-valued 
function. 

Theorem 2 Consider estimators of X having the form 
for some matrix A G R^^^^ and (Borel measurable) function 
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Fig. 1: The statistical knowledge required for computing the 
PLMMSE estimator 



b : R'^ — > R^^. Then the estimator minimizing the MSE within 
this class is given by 

xPL = r^-rt^f + E[x|z], (4) 

where 

Y = Y -E[Y\Z]. (5) 

Proof: See Appendix IbI ■ 
Note again that dUi is of the form of Q with A = r^^rt ^ 
and b{Z) = E[X|Z] -T^^rt ^E[y |Z]. The major advantage 
of this solution with respect to the non-separable estimator ([T]i, 
is that the only required knowledge regarding the statistical re- 
lation between X and Y is of second-order type. Specifically, 
as we show in Appendix |C] (|4|i can be equivalently written as 

X^^ = (Txy - T^nl^nl) (Tyy - TyNi^Y^i) (y - Yz^^ 
+ (6) 

where we denoted X^^ = E[X\Z] and Y^^ = E[Y\Z]. 
Therefore, all we need to know in order to be able to compute 
the separable PLMMSE estimator ^ is the covariance matrix 
Txy, the conditional expectation E[X|Z] and the marginal 
joint cumulative distribution function Fyz of Y and Z. This 
is illustrated in Fig. [1] In fact, as we show in Section |III1 in 
addition to being optimal among all partially linear methods, 
the PLMMSE solution Q is also optimal in a minimax 
sense among all estimation strategies that rely solely on the 
quantities appearing in Fig. [T] 

The intuition behind ^ is similar to that arising in dynamic 
estimation schemes, such as the Kalman filter Specifically, 
we begin by constructing the estimate E[X|Z] of X based 
on the measurements Z, which minimizes the MSE among 
all functions of Z. Next, we would like to account for Y. 
However, since Z has already been accounted for, we first 
need to subtract from Y all variations caused by Z. This is 
done by constructing the RV Y of (|5]l, which can be thought 
of as the innovation associated with the measurements Y with 
respect to the initial estimate E[X|Z]. Finally, since we want 
an estimate that is partially linear in Y, we update our initial 
estimate with the LMMSE estimate of X based on Y. 

Before discussing the minimax-optimality of the PLMMSE 
estimator, it is insightful to examine several special cases, as 
we do next. 
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a ) Independent measurements: Consider first the case in 
which Y and Z are statistically independent. In this setting, 
Y = Y - hy and therefore the PLMMSE estimator (|4) 
becomes 



X 



PL 



YY 



{Y - ^iy) + E[X\Z] = X^ + X'^ 



NL 



MX, 
(7) 



where Xy denotes the LMMSE estimate of X from Y. Thus, 
in this setting, the PLMMSE estimate reduces to a linear 
combination of the LMMSE estimate Xy and the MMSE 
estimate X^^. The need for subtracting the mean of X arises 
because both Xy ™d -^z^ account for it. Indeed, note that 
E[Xy] = ^] = fj,x, so that without subtraction of fix, 
the estimate X^^ would be biased, with a mean of 2/ix. 

b) Z is independent of X and Y: Suppose next that 
both X and Y are statistically independent of Z. Thus, in 
addition to the fact that Y = Y — fiy, we also have E[X|Z] = 
fix- Consequently, the PLMMSE solution ^ reduces to the 
LMMSE estimate of X given Y: 



X = TxYT^yiY - fiy) + fXx^X\ 



(8) 



c) Y is uncorrelated with X and independent of Z: 
Consider the situation in which Y and Z are statistically 
independent and X and Y are uncorrelated. Then Y = Y—fiy, 
and also T ^y = Txy so that Q becomes the MMSE 
estimate of X from Z: 



X = ¥\X\Z] = X| 



NL 



(9) 



d) X is independent of Z: In situations where X and Z 
are statistically independent, one may be tempted to conclude 
that the PLMMSE estimator should not be a function of Z. 
However, this is not necessarily the case. Specifically, although 
the second term in © becomes the constant E[X|Z] — fix in 
this setting, it is easily verified that = Txy, so that the 
first term in (|4]i does not vanish unless X is uncorrelated with 
Y. As a consequence, the PLMMSE estimator can be written 
as 

X = TxYTl^Y + fix-TxYrl.^nY\Z], (10) 



in which the last term is a function of Z. This should come 
as no surprise, though, because if, for instance, Y = X + Z, 
then the optimal estimate is X = Y — Z, even if X and Z are 
independent. This solution is clearly a function of Z. 

e) X is uncorrelated with Y : A similar phenomenon 
occurs when X and Y are uncorrelated. Indeed in this case, 
so that the first term in (|4|i does not 



XY 



■ x'i'^Yi 



vanish unless X^^ is uncorrelated with Y^^. Consequently, 
the estimator (|4]i can be expressed as 



X = 



"T^NLy-NL r-!~^,~.^ 



YY^ 



■ X'S^YS 



^l^E[Y\Z]+nX\Z], 
(11) 

in which the first term is clearly a linear function of Y. 

f) Additive noise: Perhaps the most widely studied mea- 
surement model corresponds to linear distortion and additive 
noise. Specifically, suppose that 



Y = HX + U, Z = GX + V, 



(12) 



where H e R^^*^ and G e R'^^^ are given matrices 
and U and V are zero-mean RVs such that X, U and V 
are mutually independent. As we show in Section |IV] there 
are situations in which the distribution of X is such that 
the complexity of computing the MMSE estimator E[X|y, Z] 
is huge, whereas the complexity of computing E[X|Z] is 
modest. In these cases one may prefer to resort to PLMMSE 
estimation. This method does not correspond to a convex 
combination of the LMMSE estimate of X from Y and 
the MMSE estimate of X from Z, as might be suspected. 
Indeed, substituting Y = HX + U, we have that Fxy = 
TxxH^ and Tyy = HTxxH^ + ^uu- Furthermore, 
E[Y\Z] = HE[X\Z], so that T^NLyNL = T^NL^NLi?^ and 

= HT H^. Consequently, the PLMMSE 

estimator (|6]l becomes 



X = AY + {I - AH)E[X\Z], 
where I is the identity matrix and A is given by 



(13) 



XX 



■ X X 



X I ii I Txx — Fj^NL^NL j + Tui/j ■ 



(14) 



We see that, as opposed to a convex combination of X^^ 
and Xy, the PLMMSE method reduces to a combination of 
Xz^ and Y. Furthermore, the weights of this combination are 
matrices rather than scalars. 

As a toy example demonstrating this, suppose that X is a 
scalar binary RV taking the values ±1 with equal probability, 
that = G = 1, and that U - J\f{0, a^) and V ~ AA(0, cr^). 
It is easily verified that in this case 



X^^=E[X\Z] 



JV(Z - 1; 0, al) ^ Af{Z + 1; 0, a^) 
MiZ -l;0,al)+MiZ +l;0,al.y 



(15) 

where fJ,, a^) denotes the Gaussian density function with 
mean fi and variance ct^, evaluated at 7. Similarly, 



X}^^''-^{Y-fiy)+flX 



Oy 



1+a^^' 



(16) 



where we used the facts that a\- — a\ + a\j and gxy — 
a\ = \. The PLMMSE estimator ( flj] ). is therefore given by 



(17) 



where 7= {\ - o\^^) j {\ + a}, - a\^y, 



(see (O). Figure |3] 
compares the MSE "attained by the PLMMSE method to that 
of the naive convex-combination estimator 



X' 



aX\r + (1 - a)X 



NL 



(18) 



for all a S [0, 1]. As can be seen, when cry = dy, the MMSE 
of the PLMMSE method is roughly 12% lower than the lowest 
MSE of the naive estimator This advantage becomes less 
significant as ujj and ay draw apart. As mentioned above, 
though, in multi-dimensional problems the PLMMSE method 
uses matrix weights rather than scalars, so that its potential 
for improvement over the naive estimator is yet greater. 
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2: The MSE attained by X^^ of ([TtIi and by X"**'™ of (O as a function of a for several values of au and ay- 
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III. Partial Knowledge of Statistical Relations 



IV. PLMMSE Estimation of Sparse Vectors 



As discussed in Section [III one of the appealing properties 
of the PLMMSE solution is that it does not require knowing 
the entire joint distribution of X and Y, but rather only its 
second-order moments. However, the fact that the PLMMSE 
estimator is merely determined by E[X|Z], Cov{X,Y) and 
Fyziu-.z), does not yet imply that it is optimal among all 
methods that rely solely on these quantities. The question of 
optimality of an estimator with respect to partial knowledge 
regarding the joint distribution of the signal and measurements 
was recently addressed in JTj. One of the notions of optimality 
considered there, which we adopt here as well, follows from 
a worst-case perspective. Specifically, any estimator X = 
g{Y, Z), may attain high MSE under certain distributions 
FxYz{x,y, z) consistent with our knowledge and it may 
attain low MSE under other such distributions. We consider 
an estimator minimax-optimal if its worst-case MSE over the 
set of all feasible distributions is minimal. For example, it 
was shown in |7| that the LMMSE estimator Xy attains the 
minimal possible worst-case MSE over the set of distributions 
FxY{x,y) with given first- and second-order moments. 

In the next theorem we show that the PLMMSE method is 
optimal in the sense that its worst-case MSE over the set of 
all distributions Fxyz{x, y, z) complying with the knowledge 
appearing in Fig. [T] is minimal. 

Theorem 3 Let A be the set of probability distributions of 
(X, y, Z) satisfying 

Coy(X) = Txx, Coy{X,Y) = Txy, E[X|Z] = g(Z), 
FxYz{oo,y,z) = FYz{y,z), (19) 

where Txx ond Txy <^re given matrices, g{z) is a given 
function and Fyziyiz) is a given cumulative distribution 
function. Then, among all estimators of X based on Y and 
Z, the PLMMSE method ^ has the minimal worst-case MSE 



sup E 



X-X 



(20) 



over the set A. 

Proof: See Appendix iDl 



We now demonstrate the usefulness of the PLMMSE es- 
timator in the context of sparse approximations. Specifically, 
consider the situation in which X is known to be sparsely 
representable in a unitary dictionary '3/ e ^mxm sense 
that 

X = *VF (21) 

for some RV W that is sparse with high probability. More 
concretely, we assume that the elements of W are given by 



(22) 



where the RVs {Bi} and {Si} are statistically independent, 
Bi ^ J^{0, 1) and Si — (or take small values) with high 
probability. This assumption is very common in the sparse 
approximation literature. For example, in [8| and [9| the 
variables Si are assumed to follow a Gamma distribution. 
Here, we assume, as in |[3l, that they are binary, such that 
P{St = CTi.i) = 1 - F{S^ = a2,i) = Pr with some cti,, > 0, 
o'2,j > and pi > 0. In particular, setting a2.i = and 
Pi small corresponds to vectors W that are sparse with high 
probability. 

Assume X is observed through two linear systems, as in 
( fT2b . where H is an arbitrary matrix, G is an orthogonal 
matrix satisfying G^G = c?I for some constant a 7^ 0, 
and V and V are Gaussian RVs with Tjjtj — a^I and 
Fyy = TyJ. A practical image enhancement scenario and 
a target tracking situation corresponding to this setting are 
detailed in sections |V] and |Vll respectively. This setting can 
be cast in the standard sparse approximation form as 



X 



(23) 



It is well known that the expression for the MMSE estimate 
E[X|y, in this case generally comprises 2^^ summands, 
which correspond to all different possibilities of sparsity 
patterns in 14^ This renders computation of the MMSE 
estimate prohibitively expensive even for modest values of 
M and consequently various approaches have been devised 
to approximate this solution by a small number of terms (see 
e.g., 13 and references therein). For example, the fast Bayesian 
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matching pursuit (FBMP) algorithm developed in |T| employs 
a search in the tree representing all sparsity patterns in order to 
choose the terms participating in the approximation. We note 
that FBMP, as well as other sparse recovery methods, can 
operate with general measurement and dictionary matrices. 

There are some special cases, however, in which the MMSE 
estimate possesses a simple structure, which can be imple- 
mented efficiently. As we show next, one such case is when 
both the channel's response and the dictionary over which X 
is sparse correspond to orthogonal matrices. As in our setting 
^ is unitary and G is orthogonal, this implies that we can 
efficiently compute the MMSE estimate E[X|Z] of X from Z. 
Therefore, instead of resorting to schemes for approximating 
E[X|F, Z], we can employ the PLMMSE estimator of X based 
on Y and Z, which possesses a closed form expression (see 
( fTsl l) in this situation. This technique is particularly effective 
when the SNR of the observation Y is much worse than that 
of Z, since the MMSE estimate E[X|y, Z] in this case is close 
to being partially linear in Y . Such a setting is demonstrated 
in Section HV-Cl 

A. MMSE Estimate of a Sparse Signal in a Unitary Dictionary 
In our setting 

Z = GX + V = G^W + V, (24) 

with W of ( l22b . Since G and ^ are orthogonal, they are 
invertible, so that 

Z = -*^G^Z (25) 

a 

carries the same information on X as Z does, namely 

E[X|Z] = E[X|Z] *E[W^|Z]. (26) 

Now, for every i = 1, . . . , M, we have that Zi = aWi + Vi, 
where V = a-^^'^G^V" is distributed 7V(0,cr^/). There- 
fore, the set {Zj}j-ti is statistically independent of the pair 
{Wi, Zi) and consequently 

nw,\z]^nw,\z,] 

= E[W,\Z,, S, = ai,,]P(5, = ai^,|ZO 

+ E[W,\Z,,S, - a2,mS^ = 'T2AZ^). (27) 

Under the event Si = aj,i with a fixed j e {1,2}, the RVs 
Wi and Zi are jointly normally distributed with mean zero, 
implying that 



Cov(W„Z,) 



(28) 



Cov(Zi) 

Finally, using Bayes rule, the term P{Si = ai^i\Zi) reduces to 

" N{Z,-Q,a^Gl^ + al)p,+N{Z,-0,cTl^ + al){l^p,) 

(29) 



and, similarly, P(S'i = 0-2,2 l-^j) is given by 

AA(Z,;0,aV2', + af)(l-p,) 

AA(Z,; 0, a^al , + al)p, + AA(Z,; 0, a| , + <jI){1- pi) ' 

(30) 

Substituting and (|28ll into leads to the following 
observation. 

Theorem 4 The MMSE estimate ofX of (EB given Z of ^ 
is 



1 



E[X\Z] = */ \^-^^G' Z ] , 
where f{z) = (/(zi), . . . , /(^Af))^, with 



(31) 



p^ N{ii; 0, a^af i + tr^) + (1 - Pi) N{ii\ 0, aV^ ^ + ct^,) ' 

(32) 

Therefore, if, for example, is a wavelet basis and G = I 
(so that a = 1), then E[X|Z] can be efficiently computed 
by taking the wavelet transform of Z (multiplication by NP^), 
applying a scalar shrinkage function on each of the coefficients 
(namely calculating f{zi) for the ith coefficient) and applying 
the inverse wavelet transform (multiplication by ^) on the 
result. Note that the shrinkage curve (|32t is different than the 
soft-threshold operation, originally proposed in [10]. The latter 
can be obtained as the MAP solution with a Laplacian prior, 
whereas our function corresponds to the MMSE solution with 
a Gaussian mixture prior 

B. PLMMSE Estimate of a Sparse Signal 

Equipped with a closed form expression for E[X|Z], we 
can now obtain an expression for the PLMMSE estimator (fT3T l. 
Specifically, we have that 



■ XX 



WW 



(33) 



where rvFiv is a diagonal matrix with {Tww)i,i = Pif^i i + 
(1 — Pi)cr2 i- Similarly, 



■ x^^X^ 



*Cov(/(Z))*^ 



(34) 



where Cov(/(Z)) is a diagonal matrix whose (j, i) element is 
(3i — Cov{f{Zi)). This is due to the fact that the elements of 
Z are statistically independent and the fact that the function 
/(•) operates element-wise on its argument. Therefore, the 
PLMMSE estimator is given in our setting by equation (fTsl i 
with E[X|Z] of dlB and with the matrix 



* [Tww - Cov(/(Z)) j ^^H^ 

X (if* (t^^ - Cov(/(Z))) *^ff^ 



(35) 



Observe that there is generally no closed form expression 
for the scalars f3i — Cov{f{Zi)), rendering it necessary to 
compute them numerically. Since each j3i is the variance of 
a scalar RV, it can be computed very efficiently, either by 
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approximating the corresponding integral by a sum over a set 
of points on the real line or by Monte Carlo simulation. In 
Section |V] we demonstrate how this can be done in a practical 
scenario. 

An important special case corresponds to the setting in 
which Pi — p, — a\, and cr| j = (t| for every i. In this 
situation, we also have that Pi = /3 for every i. Furthermore, 

Tww = {pcjl + - p)'tI)I (36) 
so that A is simplified to 

A = H^ (hH^ + ./^ . , -l] . (37) 

V pal + {I - p)<tI ~ P J 

As can be seen, in this setting A does not involve multipli- 
cation by or 'S'^. Thus, if H corresponds to a convolution 
operation, then A also corresponds to a filter, which can be 
efficiently applied in the Fourier domain. 

C. Numerical Study 

We now compare via simulations the MSE attained by 
X^^ to that attained by X^^, Xy ^^d the approximation to 
E[X|F, Z] produced by the FBMP method. Since we generate 
the signal X and measurements Y and Z according to the 
assumed model, we do not compare our method to other 
Baysian approaches, such as Bayesian compressive sensing 
(BSC) m and sparse Bayesian learning (SBL) |9|, which 
assume a different generative model. Nevertheless, we note 
that a practical scenario, which deviates from the assumptions 
of all these methods, was studied in [31 and showed that the 
performance of FBMP is commonly better than that of BSC 
and SBL. In terms of running time, FBMP is typically an order 
of magnitude faster than SBL and roughly twice as slow as 
BSC. 

In our experiment £ ]]j256x256 taken to be a 
Hadamard matrix with normalized columns. The matrix H 
corresponded to (circular) convolution with the sequence 
h[n] = cxp{— |n|/8.5} and G was taken to be diagonal. To 
comply with the assumption made in |3| that the columns of 
the measurement matrix are normalized, we normalized the 
columns of H to be of norm 0.99 and set G — 0.01/. We set 
Pi = p, af i ^ af, and cr| ^ = for every i, so that X was truly 
sparse with high probability. Figure |3] depicts the MSE of all 
estimators as a function of the input SNR, which we define as 
101ogio(pcr?/CT^)- As can be seen, the MSE of the PLMMSE 
method is significantly lower then that of X^^ and Xy ^^d 
is very close to that attained by the FBMP method. At low 
SNR levels and low sparsity levels (high p) the performance 
of the PLMMSE method is even sUghtly better than the FBMP 
approach. 

The average running time of the PLMMSE method was 
0.6msec for all tested values of p. The average running 
times of the FBMP method were 52.7msec, 79.6msec and 
125.2msec, respectively, for p = 1/3, p = 1/2 and p = 
2/3. The ratio between the computational cost of the two 
approaches, which was two orders of magnitude in this exper- 
iment, becomes higher as the dimension of X is increased. At 
certain dimensions, such as images of size 512 x 512 (in which 



case M = 512^), the FBMP method becomes impractical 
to apply while PLMMSE estimation can still be used very 
efficiently. 

A word of caution is in place, though. In situations in which 
the SNR of the measurement Y is roughly the same as that 
of Z (or better), the FBMP method is advantageous in terms 
of performance. Therefore in this regime, decision on the use 
of PLMMSE estimation boils down a performance-complexity 
tradeoff. 

V. Application to Image Deblurring with 
Blurred/Noisy Image Pairs 

When taking photos in dim light using a hand-held camera, 
there is a tradeoff between noise and motion blur, which can 
be controlled by tuning the shutter speed. Indeed, when using a 
long exposure time, the image typically comes out blurred due 
to camera shake. On the other hand, with a short exposure time 
(and high camera gain), the image is very noisy. In (|6l it was 
demonstrated how a high quality image can be constructed by 
properly processing two images of the same scene, one blurred 
and one noisy. 

We now show how the PLMMSE approach can be applied 
in this setting to obtain plausible recoveries at a speed sev- 
eral orders of magnitude faster than any other sparsity-based 
method. In our setting X, Y and Z correspond, respectively, 
to the original, blurred (and slightly noisy) and noisy images. 
Thus, the measurement model is that described by (l2Jt . where 
H corresponds to spatial convolution with some blur kernel, 
G ~ I, and U and V correspond to white Gaussian images 
with small and large variances respectively. We further assume 
that the image X is sparse in some orthogonal wavelet basis 
Nf, such that it can be written as in (ISTT l and (|22] |. 

As we have seen, in this setting, the PLMMSE estimator 
can be computed in two stages. In the first stage, we cal- 
culate X^^ ~ E[X|Z] (namely, denoise the image Z) by 
computing the wavelet transform Z = Z, applying the 
scalar shrinkage function (l32l l on each wavelet coefficient, and 
taking the inverse wavelet transform of the result. This stage 
requires knowledge of the parameters {pi}, {cri.i}, {<^2,i] and 
(jy. To this end, we assume that a2.i = for all i (a truly 
sparse image) and that pi and (Ti ^ are the same for wavelets 
coefficients at the same level. In other words, all wavelet 
coefficients of the noisy image Z at level i correspond to 
independent draws from the Gaussian mixture 

/^^(z) =/AA(z;0,aVi% + 4) + (l V)AA(5;0,4). (38) 

Consequently, p^, ai^e and ay can be estimated by expectation 
maximization (EM). In our experiments, we assumed that cry 
is known and thus did not estimate it. 

In the second stage, the denoised image X^^ needs to be 
combined with the blurred image Y using ( fT3T l with A of 
(|35] |. As discussed in Section IIV-BI this can be carried out 
very efficiently if pi — p and cri = cri for all i. For the 
sake of efficiencjQ, we therefore abandon the assumption that 

'The exact solution involving {35) can be computed by using iterative 
techniques for matrix inversion, in which each iteration comprises tiltering 
operations and forward and inverse wavelet transforms. However, we found 
that in most cases this approach leads to improvement of only 0.2dB-0.6dB 
in PSNR and is much more demanding computationally. 
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Pi and (Ji.i vary across wavelet levels and assume henceforth 
that all wavelet coefficients are independent and identically 
distributed. In this case, A corresponds to the filter 



A{uj) = 



(39) 



(a^-/3)|i?HP+a^' 

where is the frequency response of the blur kernel. 

Consequently, the final PLMMSE estimate corresponds to the 
inverse Fourier transform of 



X 



PLMMSE 



(40) 

where Y^{u)) and (tj) denote the Fourier transforms of Y 
and X^^, respectively. In our experiment, we assumed that 
the blur H{lo) and noise variance a"^ are known. In practice, 
they can be estimated from Y and Z, as proposed in (|6l. This 
stage also requires knowing the scalars ct^ = E[l/l^^] and 
/3 = E[/^(z)], which we estimate as 

^ M M 

-^ = iiT.~^'--v, ^=^E/'(-~o. (41) 

i=l 1=1 

Figure |4] demonstrates our approach on the 512 x 512 
Gold-hill image. In this experiment, the blur corresponded to 
a Gaussian kernel with standard deviation 3.2. To model a 
situation in which the noise in Y is due only to quantization 
errors, we chose <tjj = \j\fVl » 0.3 and uv — 45. These 
parameters correspond to a peak signal to noise ratio (PSNR) 
of 25.08dB for the blurred image and 15.G7dB for the noisy 
image. 

We used the orthogonal Symlet wavelet of order 4 and 
employed 10 EM iterations to estimate and (j\ g in each 
wavelet level. The entire process takes 1.1 seconds on a Dual- 
Core 3GHz computer with un-optimized Matlab code. We note 
that our approach can be viewed as a smart combination of 
Wiener filtering for image debluring and wavelet thresholding 
for image denoising, which are among the simplest and fastest 
methods available. Consequently, the running time is at least 
an order of magnitude faster than any other sparsity-based 
method, including the Bayesian approaches FBMP |3|, BCS 
||8] and SBL [9| and fast ii minimization algorithms such as 
NESTA In], GPSR Ull and Bergman iterations ifPi. As an 



example, the authors of f3l reported that FBMP requires a 
runtime of 38 minutes to recover a 128 x 128 image from a 
few thousands of measurements and that GPSR requires 2.7 
minutes for the same task. BCS [[8] was reported to require 

15 seconds for reconstructing a 512 x 512 image from a few 
thousands of samples. 

As can be seen in Figure HI the quality of the recoveries 
corresponding to the denoised image X^^ and deblurred 
image X\ is rather poor with respect to the state-of-the- 
art BM3D debnoising method O and BM3D debluring 
algorithm [15|. Nevertheless, the quality of the joint estimate 
-^PLMMSE surpasses each of these techniques alone. The resid- 
ual deconvolution (RD) algorithnfl proposed in |6| for joint 
debluring and denoising slightly outperforms the PLMMSE 
method in terms of recovery error 

A quantitative comparison on several test images is provided 
in Table H] This comparison shows that the PSNR attained 
by the PLMMSE method is, on average, 0.3dB higher than 
BM3D debluring, 0.4dB higher than BM3D denoising, and 
0.8dB lower than RD. In terms of running times, however, our 
method is, on average, 11 times faster than BM3D deblurring, 

16 times faster than BM3D denoising and 18 times faster 
than RD. Note that RD requires initialization with a denoised 
version of Z, for which purpose we used the BM3D algorithm. 
Consequently, the running time reported in the last column 
of Table I] includes the running time of the BM3D denoising 
method. 

VI. Application to Maneuvering Target Tracking 

Next, we demonstrate PLMMSE estimation in the context 
of maneuvering target tracking. Applications in which there is 
a need to track the kinematic features of a target are ubiqui- 
tous. Often, multiple types of measurements are available. In 
non-cooperative scenarios, these may include range, bearing, 
elevation, range rate (Doppler), and more [16|. In navigation 
applications, measurements may include the signals of global 
navigation satellite systems and inertial sensors (accelerom- 
eters and rate gyros). Such sensors are used in satellites 
as well as in modern cellular phones, tablet computers and 

-In our setting this method does not produce ringing effects and thus the 
additional de-ringing stage proposed in |!6] was not applied. 





Fig. 4: Debluring with a blurred/noisy image pair using PLMMSE estimation and RD |6|. (a) Blurred image Y (top left) and 
noisy image Z (bottom-right), (b) LMMSE-deblurred image Xy (top-left) and MMSE-denoised image (bottom-right), 
(c) BM3D-deblurred image (top left) and BM3D-denoised image (bottom-right), (d) Original image X. (e) PLMMSE estimate 
X^^ from Y and Z. (f) RD recovery from Y and Z. 



vehicles. Measurements of this type can be fused to aid, e.g., 
autonomous navigation ifTTl or traffic monitoring [ 1 8 1 . 

To model the tracking problem one usually defines a state 
vector X{k) comprising the target kinematic data, which 
evolves via the following stochastic linear equation 

X{k + l) = FkX{k) + BkW{k). (42) 

Here, {W^(fc)} is a zero-mean white noise sequence satisfying 
Cov{W{k)) = cr^I for all k and {Fk} and {Bk} are known 
deterministic matrices. For the simplicity of the exposition, we 
assume that X{0) = (modification to other initializations is 
trivial). Suppose that two sets of measurements of the state 
are observed, so that 

where {U{k)} and {V{k)} are mutually independent zero- 
mean white noise sequences satisfying Cov([/(fc)) — a^I and 
Cov(F(fc)) = fTyJ, and {Hk] and {Gk} are given matrices. 



At the nth time instant, the goal is to obtain an estimate 
X{n) of X{n) based on the measurements {y(k)}^^^ and 
{Z {k)}^^^. Ideally, we would like our estimation scheme to 
possess a recursive structure such that X{n) is computed from 
the previous estimate X{n—1) and the current measurements 
Y{n) and Z{n) without needing to store the entire measure- 
ment history. 

A simple, yet popular method for modeling maneuvering 
targets is the dynamic multiple-model method |fT9l in which 
W{k) follows a Gaussian mixture distribution. In this case, 
low intensity noise represents the nominal, non-maneuvering 
motion regime of the target, and high intensity process noise 
represents abrupt maneuvers characterized by increased model 
uncertainty, and caused by, e.g., faults in the actuators of 
an autonomous aerospace system. Unfortunately, the MMSE 
solution does not admit a recursive implementation ||20| in this 
setting. 

One alternative is to resort in these cases to LMMSE 
estimation, whose recursive implementation is given by the 
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TABLE I: Performance of deblurring/denoising on several images. Numbers on the left and right of the slash indicate, 
respectively, PSNR in dB and running time in seconds. 







Ay 








V<AJ 


Boat (512 X 512) 
Lena (512 X 512) 
Mandrill (512 X 512) 
Peppers (512 X 512) 
Mountain (640 X 480) 
Frog (621 X 498) 
Gold-hill (512 X 512) 


25.39 / 0.83 
26.93 / 0.73 

21.40 / 0.64 
26.74 / 0.81 
19.23 / 0.95 
23.23 / 0.94 
25.90 / 0.69 


23.45 / 0.06 
24.59 / 0.03 
20.59 / 0.06 
24.89 / 0.08 
17.69 / 0.09 
22.35 / 0.16 
24.26 / 0.06 


27.85 / 13.52 
29.47 / 13.22 
22.72 / 13.58 
29.49 / 13.14 
20.11 / 15.24 
24.00 / 16.07 
27.52 / 13.41 


28.40 / 10.23 
30.58 / 8.90 
21.78 / 9.57 
29.74 / 8.91 
18.45 / 11.12 
24.40 / 13.37 
28.70 / 9.54 


28.05 / 0.88 
30.58 / 0.81 
22.58 / 0.72 
29.80 / 0.88 
20.03 / 1.05 
24.69 / 1.09 
28.82 / 1.09 


29.22 / 15.31 
31.37 / 15.19 
23.30 / 15.58 
31.52 / 15.03 
20.42 / 17.47 
24.69 / 21.14 
29.09 / 21.14 


Average 


24.12 / 0.81 


22.55 / 0.08 


25.88 / 14.03 


26.01 / 10.23 


26.31 / 0.89 


27.09 / 16.19 



Kalman filter Another option is to employ approximations 
of the MMSE estimate, which can be computed recursively, 
such as the interacting-multiple-model (IMM) filter The 
performance of these methods tends to depend heavily on 
the assumption that the measurement noises are Gaussian. 
When their actual distribution is unknown, their performance 
deteriorates. 

Sometimes, nonetheless, the MMSE estimate can be cal- 
culated in an online manner As shown below, this happens, 
e.g., when the state evolves according to the white acceleration 
model I2T) and available are acceleration measurements. When 
supplied with two sets of measurements, only one of which 
allowing recursive MMSE estimation, it may be advantageous 
to use PLMMSE estimation rather than approximate MMSE 
solutions. In this case, under some mild conditions, the 
PLMMSE estimate can be updated recursively, similarly to 
the Kalman filter 

Suppose that the distribution of {T^(fc)} and {VF(fc)} is 
such that, for any k < n, E[W{k)\Z{l), . . . , Z{n)] ^ 
E[W{k)\Z{k + l)] and that the RVs {E[W{k)\Z{k + l)]} are 
uncorrected. As we discuss in the sequel, this implies that the 
MMSE estimate X^^{n) can be computed recursively from 
X^^{n- 1) and Z{n). Our goal is to compute the PLMMSE 
estimate X^^{n) of X{n) from {Y{k)y^^-^ and {Z{k)y^^^. 
To obtain a recursive implementation, it is insightful to exam- 
ine first the batch PLMMSE solution. To this end, we define 



X 



U = 



fX{l) 

\x{n) 

U{1) 









('■") 




\Y{n)J 






1 H 






\V{n)) 



w = 



/Z{1) 

\Z{n), 
W{0) 

,W{n~l). 



(44) 



Therewith, we have from 



that 



X = ^W, Y = HX + U, Z = GX + V, (45) 



where 



/ Bo 
FiBq 
F2F1B0 





F2B1 



n — 1 n — 2 

\k=l k=l 



\ 







Bn-1 



, (46) 



and 



H = dmg{Hi H2 ■■■ H,, 



G = diag(Gi G2 ■■■ G„) 



(47) 



(48) 



Therefore, as we have seen in (fTjt . the batch PLMMSE 
estimate of X from Y and Z is given in this case by 



E[X\Z] + A{Y - HE[X\Z]) 



(49) 



with the matrix A of 1 

Noting that E[X|Z] = \I>E[Vl^|Z] and taking into account 
our assumption that E[VK(fc - 1)|Z] =E[W{k-l)\Z{k)] and 
the block lower- triangular structure of 'S', we see that the fcth 
element in the vector E[X|Z] is given by 

XfL(fc) ^ Fk-iX^''{k-l)+Bk-iE[W{k~l)\Z{k)]. (50) 

We have thus obtained a recursive computation of X^^{k) 
from Xf^(fc- 1) and Z{k). 

Next, it remains to determine whether the operation of A 
admits a recursive implementation. Before we do so, we recall 
that the LMMSE estimate of X from Y, which is given by 



X' 



cjf,l\ Y 



(51) 



in this case, can be implemented recursively via the Kalman 
filter In our setting, Txx = <^w^^'^ '^^e as- 

sumption that Cov(E[VK|Z]) ~ (31, we have that T ^nl ^nl — 

= {P/a^)Txx- Therefore, the matiix A of the 
PLMMSE estimate reduces from (fT4l l to 



I' ^)TxxH^ 111 



HTxxH^ + a^I 



= TxxH^ [ HTxxH^ 



(52) 



We see that this matrix is the same as that appearing in 
( fSTT l. except for the noise variance which is multiplied here 
by '^'wl^'^'w ~ P)- This implies that multiplication by A 
corresponds to a Kalman filter with higher observation noise. 

We conclude that the complete recursive PLMMSE imple- 
mentation comprises the following steps: 

a) Initialization: ^(0) = 0, X(0) = 0, Pq = 0. 

b) Recursion: For k — 1,2,... perform the routine sum- 
marized in Alg. [T] 
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Algorithm 1 One Cycle of the Recursive PLMMSE. 

Input: Y{k),Z{k),X^^{k- l),X{k~ l),Pk-i 
1: W^^{k - 1) = E[W{k - l)\Z{k)]. 

2: X^^ik) - F,_iXfL(fc _ 1) + B,_,W^^{k - 1). 

3: Y^^{k)=HkX^^. 

4: Y{k) = Y(k) - Y^^{k). 

5: Update X{k — l),Pk~i using Y{k) via a Kalman step; 

Pfe Fk-iPk-iFk-i + (JwBuBl 

Lk = {I-KkHk)Ak-i 
X{k) ^ LkX{k ~ I) + KkY{k) 

P, ^ P,K, [h.PIHI + ^^^I^ Kl 

6: X^^{k) = X^^{k) + X{k). 

Output: X^^{k),X^^{k),X{k),Pk 



A. Example: Tracking a Maneuvering Target From Position 
and Acceleration Observations 

A common way of describing target kinematics is via the 
nearly-constant-velocity model, also known as the white accel- 
eration model [21 1. Focusing on one dimension for simplicity, 
and denoting by P{k) the position of the target at time kT, 
the state vector X{k) (P(fc) P{k - 1) P{k - 2))^ in 
this model evolves according to ( |42] | with 

/2 -1 0\ 
Fk=\l . (53) 
Vo 1 0/ 

If the sampling interval T is small, then, to close extent, = 
(l O) (see e.g., ET\ ). The driving noise Wk in this 
model corresponds to the target's acceleration. 

In many applications |l22T|, the target's velocity changes 
gradually most of the time apart for abrupt transients, which 
occur every once in a while. This behavior can be described 
by letting Wk = SkBk, where Bk - A/'(0, 1) and ¥{Sk = 
(Ji) = 1 — P(S'fc — = p. If (Ji is much larger than tT2 
and p is small then the mean time between consecutive large 
acceleration events is large. In the terminology of the multiple 
model approach mentioned above, each of the components 
in the Gaussian mixture corresponds to a different dynamic 
model. 

Suppose that we observe noisy measurements Y{k) and 
Z{k) of the position P{k) and acceleration W{k — 1), re- 
spectively. These measurements relate to the state vector via 
(Il3]l, with i/fc = (1 0) and Gfc = (1 -2 l). Indeed, 
it is easily verified that in this setting Z{k) — W{k — 1) + 
V{k). Consequently, for n > k, E[W{k)\Z{l), Z{n)] = 
E[W{k)\Z{k + 1)] = f{Z{k + 1)) with the function /(•) of 

Equipped with the matrices Fk, Bk, Hk, and Gk, we 
estimate the state vector of (|42] | using the recursive PLMMSE 
method described above. Note that although X{k) comprises 



only the target's positions at times k, fc — 1, and fc — 2, 
estimating the velocity and acceleration is straightforward as 

P(fc)-P(fc-1) and P(fc)-2P(fc-l)+P(fc-2), respectively. 

We compare the performance of the recursive PLMMSE 
method with the one of a standard KF which provides, 
recursively, the LMMSE optimal estimate of the state using 
the measurement sets {Z{k)} and {Y{k)}, as well as with 
that of the IMM filter 1 1 1 which is known to be extremely 
effective in multiple model estimation problems |fT9l . The 
main idea underlying the IMM algorithm is to maintain a bank 
of primitive Kalman filters, each matched to a different model 
in the given model set. Each filter produces a local estimate 
with an associated error covariance using its initial estimate 
and covariance and the current measurement. The key element 
of the IMM scheme is the interaction block that generates, 
using all local estimates, individual initial conditions for each 
of the primitive filters in the bank. In our case, the two models 
maintained by the IMM method correspond to the two possible 
realizations of Sk such that one model corresponds to the low 
process noise variance representing the nominal target regime 
and the second model for the maneuvering one. 

We simulated a random sequence {X{k)} for k = 
1,...,1000 according to (gUi initialized at X{0) = and 
driven by a process noise {VF(fc)} having, at each time, a two- 
modal Gaussian mixture distribution, with p — 0.05, ci = 10, 
and (T2 — 1. The state position is observed via a Gaussian 
measurement equation with ajj = 5 and the covariance of the 
Gaussian measurement noise of the acceleration, cry, is swept 
from 1 to 15. Averaged over 500 independent Monte Carlo 
runs, the average squared position, velocity, and acceleration 
errors are presented, respectively, on the top, middle, and 
bottom of the left chart of Fig. |5] Outperformed by the IMM, 
the PLMMSE method scores better than the Kalman filter 
since it optimally utilizes the acceleration measurements. It is 
noticeable that at high values of ay, Kalman's and PLMMSE's 
errors coincide indicating that acceleration measurements do 
not carry valuable information in addition to that carried by 
{Y{k)}. 

In many practical scenarios the distribution of the position 
measurement noise is far from Gaussian (see e.g., Il23l and 
references therein). On the right chart of Fig. we present 
the position, velocity and acceleration errors obtained for a 
Gaussian mixture distribution of the position noise having the 
same first- and second-order statistics as before. Such a distri- 
bution may model occasional outlier measurements or sensor 
faults |24|. None of the filters is supplied with this information 
and only the first- and second-order moments are provided to 
the algorithms. Utilizing the position measurements in a linear 
manner, both Kalman and PLMMSE keep the performance 
unchanged relative to the Gaussian case. In a contradistinction, 
the IMM algorithm results in an inferior performance. This 
phenomenon is tightly related to the statement of Theorem |3] 
claiming that the PLMMSE method is ensured to attain a 
smaller worst-case MSE in comparison to any other estimation 
technique provided the appropriate moments are kept constant. 
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(a) (b) 

Fig. 5: Mean squared estimation errors of the position (top), velocity (middle) and acceleration (bottom) vs. using the 
PLMMSE approach (solid), IMM (dashed) and standard KF (dash-dotted), (a) {C/(fc)} have a Gaussian distribution, (b) {[/(fc)} 
have a Gaussian mixture distribution. 



VII. Conclusions 

In this paper we derived the PLMMSE estimator, which 
is the method whose MSE is minimal among all functions 
that are linear in Y . We showed that the PLMMSE solution 
depends only on the joint second-order statistics of X and 
Y , which renders it applicable in a wide variety of situations. 
Furthermore, we showed that this estimator attains the lowest 
worst-case MSE over the set of distributions whose joint 
second-order moments of X and Y are fixed. We demonstrated 
our approach in the context of recovering a vector, which is 
sparse in a unitary dictionary, from a pair of noisy measure- 
ments. In this setting, the PLMMSE solution achieves an MSE 
very close to that attained by iterative approximation strategies, 
such as the FBMP method of 0, and is much cheaper 
computationally. We applied our method to the problems 
of image enhancement from blurred/noisy image pairs and 
maneuvering target tracking from position and acceleration 
measurements. In both applications, we showed that PLMMSE 
estimation performs close to state-of-the-art algorithms. In the 
image enhancement setting, we showed that it can run much 
faster than competing approaches. In the context of target 
tracking, we demonstrated the insensitivity of the solution to 
the distribution of the noise in Y . This property provides 
robustness against sensor faults and outlier measurements, 
problems which are very common in target tracking situations. 

Appendix A 
Proof of Theorem[T] 

Using the smoothing property, the MSE of any estimator of 
the form ([T]i is given by 



E 



E 



\X - A{Z)Y -h{Z)f \Z 



(54) 



Thus, for every specific value z that Z can take, the optimal 
choice of A{z) and h(z) is that minimizing the inner expec- 
tation. The solution to this minimization problem corresponds 



to the LMMSE estimate of X based on Y , under the the joint 
distribution of (X, Y) given Z, concluding the proof. 

Appendix B 
Proof of TheoremE] 

We start by noting that the set B of RVs constituting 
candidate estimates is a closed linear subspace within the 
space of finite-second-order-moment RVs taking values in 
R*^. Therefore, the MMSE estimate X within this subspace, 
which is the projection of the RV X onto S, is the uniquqj RV 
whose estimation error X — Xv& orthogonal to every RV of the 
form AY+b{Z). To demonstrate that X of (|4]i indeed satisfies 
this property, note that the inner product between X ~ X and 
AY + h(Z) is given by 



E 



{X - Xf{AY + h{Z)) 



Tr 



i^\{X-X)Y'^ A^l 

Ti ^\{X ~ X)h{Zf |. 

(55) 



Substituting (|4|i, the expectation within the second term be- 



E 



\yT\^Y + nX\Z]-X)h{Zf 



Yb{Z)^ +E[{E[X\Z]- X)b{Zf 



(56) 



Recall that Y = F — E[F|Z] is the estimation error incurred in 
estimating Y from Z. Consequently, Y and X — E[X|Z] are 
uncorrelated with every function of Z and, in particular, with 
b{Z), so that this expression vanishes. Similarly, substituting 
(|4|i and expressing Y = Y + E[Y\Z], the expectation within 

'in an almost-sure sense. 
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the first summand in d55T l becomes 



E 



r^yTi^Y + E[X\Z]-X]Y^ 



E 



Y{Y + E[Y\Z])'^ 
{X - E[X\Z]){Y + E[Y\Z]f 



(57) 



Being a function of Z, the RV E[y|Z] is uncorrelated with Y 
and X — E[X|Z] so that this expression can be reduced to 



■XY'-YY 



E 



YY' 



-E[{X -E[X\Z])W'^ 



XY^ YY YY 



E 



XY 



XY 



XY 



■ XY 



E 



{X - fix+fix -E[X\Z])Y^ 
iE[X\Z]-fix)Y^ 



0, 



(58) 



where we used the facts that E[Y] = 0, that F^y^^^y^yy 



XY 



Lemma 2], and that Y is uncorrelated with E[X|^] 
(due to the same argument as above). This completes the proof. 

Appendix C 
Derivation of Equation © 

By definition, 

T^y ^E[{X - ^ixW ^E[Y\Z]f] 

= E[iX - fixW -iiY + tiY- nY\Z]f] 

= E[{x-^lx){Y-^lYf]-E[{x-^lx){nY\z]-^lYf] 

= TxY - E[E[(X - Mx)(E[r|Z] - ^iYf\Z]] 
= TxY - E[(E[X|Z] - ^lx){nY\Z] - ^iY f] 



(59) 



where X^^ = E[X\Z] and Yf^ = E[Y\Z]. Here, the fourth 
equality is a result of the smoothing property and the last 
equality follows from the facts that E[E[X|Z]] ~ fix and 
E[E[y|Z]] = jiY- In a similar manner, it is easy to show that 



Tyy — Tyy TyNLyNL . 



(60) 



Using (l60| and the fact that Y is uncorrelated with E[F|Z] 
/ly, we obtain 

Tyy = E[YY^] 

^E[{Y -E\Y\Z])Y'^] 

= E[{Y - ^iY)Y^] - E[(E[y|z] - ^iY)Y^] 



YY 
^YY 



1 ^ 1 ^ 



(61) 



Substituting ( |59] l and ( |6T| i into (01 leads to 



Appendix D 
Proof of Theorem[3] 

Let e{FxYZ,X) = Ep^^^W^X - denote the MSB 

incurred by an estimator X cii X based on Y and Z, when 



the joint distribution of X, Y and Z is FxYz{x,y,z). It is 
easily verified that 

£ {FxY z , Xpi^mmse) ~ Tr{rxx} 

— Tr I (Fxy — NLyNL) (Fyy — FyNLyNL)^ (Fxy — r_Y'~"-F~L) 

"(62) 

for all FxYZ G -A.. Therefore, in particular, (l62i is also the 
worst-case MSB of X^^ over A. We next make use of the 
following lemma. 

Lemma 1 There exists a distribution F^yz 
of distributions satisfying il9i , under which the PLMMSE 
estimate of X based on Y and Z coincides with the MMSE 
estimate E[X\Y, Z\ 

Proof: See Appendix IE] ■ 
Now, any estimator X that is a function of Y and Z satisfies 



sup e{FxYZ,X)>e(F*xY2.X) 



> mm e{F;.Yz,X) 

X 

= e(F^YZ,E[X\Y,Z]) 
= e(Fly^,xPL) 

= max eiFxYZ,X^^ 

FxYzeA 



0, (63) 



where the first line follows from the fact that F 



XYZ 



e A, the 

third line is a result of the fact that the MMSB and PLMMSB 
estimators coincide under F'^y^, and the last line is due to the 
fact that £{FxyZtX^^) is constant as a function of Fxyz 
over A. We have thus established that the worst-case MSB 
of any estimator over A is greater or equal to the worst-case 
MSB of the PLMMSB solution over A, proving that X^^ is 
minimax optimal. 



Appendix B 
Proof of Lemma[T] 

We prove the statement by construction. Let Y and Z be two 
RVs distributed according to Fyz and denote h{Z) = E\Y\Z] 
and Y = Y — h{Z). Let [/ be a zero-mean RV, statistically 
independent of the pair {Y , Z), whose covariance matrix is 



Tuu = Txx - Cov(.g(Z)) 



^ xY^ YY y^' 



(64) 



It can be easily verified that this is the covariance matrix of the 
estimation error of the LMMSB estimate of X = X -E[X\Z] 
based onY ^ Y—E[Y\Z]. Therefore, this is a valid covariance 
matrix. Consider the RVo 



X ^r^yTl^Y + g{Z) + U. 



(65) 



We will show that the so constructed X, Y and Z satisfy 
the constraints ( fT9] l. Indeed, using the fact that U has zero 
mean and is statistically independent of Z, we find that the 
conditional expectation of X of ( |65l ) given Z is 

E[X\Z]^giZ). (66) 

'^Recall that ^"'^ ^YY functions of Cov(X, Y) and Fyz^ which 

are given. 
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Furthermore, since Y, g{Z) and U are pairwise uncorrelated, 
the covariance of X of ( |65l l can be computed as 

Cov(x) = r^^r^^r,>^r^^r^^ + Cov(.g(z)) + Fuu 

= T^yTl.^ry^ + Cov(5(Z)) 

+ Fxx - Cov(5(z)) - r^^rt ^rj.^ 

= rxx, (67) 

where we substituted f64\ . Finally, the cross covariance of X 
of ( |65l l and Y is given by 

Cov(x, Y) - r^^rt ^r^^ + Cov(.9(z), h{z)) 

^r^yTl.^Tyy+Cov{g{Z),h{Z)) 

= r^y + Cov{g{Z),h{Z)) 

= TxY - Cov(5(Z), h{Z)) + Cov{giZ),h{Z)) 

= TxY, (68) 

where the second equality follows from the third line of (l6ll . 
the third equality follows from |25 Lemma 2], and the fourth 
equality follows from (|59] l. Equations (|66] l, ( |67] i and d68T l 
demonstrate that the distribution -FJyz associated with X, 
y and Z, belongs to the family of distributions A satisfying 

Next, we show that the PLMMSE and MMSE estimators 
coincide under F^y^. Indeed, since U is statistically indepen- 
dent of the pair iY,Z), we have that E[U\Y , Z] =E.[U] ^ 0, 
so that 

E[x\Y, z] = r^^r^^E[y |y, z] + ng{z) + u\y, z] 
= r^^^rt ^(y - h{z)) + g{z) + E[c/|y, z] 

= T^yT\^{Y-h{Z))+g{Z), (69) 

where we used the fact that there is a one-to-one transfor- 
mation between the pair {Y,Z) and the pair {Y,Z). This 
expression is partially linear in Y , implying that this is also the 
PLMMSE estimator in this setting. Thus, for the distribution 
^XYZ^ the PLMMSE estimator is optimal not only among all 
partially linear functions, but also among all functions of Y 
and Z. 
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